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Abstract 

The main goal in this paper is to propose a new method for deriving 
oracle inequalities related to the exponential weighting method. For 
the sake of simplicity we focus on recovering an unknown vector from 
noisy data with the help of a family of ordered smoothers. The estima- 
tors withing this family are aggregated using the exponential weighting 
and the aim is to control the risk of the aggregated estimate. Based 
on simple probabilistic properties of the unbiased risk estimate, we de- 
rive new oracle inequalities and show that the exponential weighting 
permits to improve Kneip's oracle inequality [10]. 

1 Introduction and main results 

This paper deals with the simplest linear model 

Yi = (j,i + a£i, i = l,2,...,n, (1.1) 

where £j is a standard white Gaussian noise. For the sake of simplicity it is 
assumed that the noise level a > is known. 

The goal is to estimate an unknown vector f/, G R n based on the data 
Y = (Y\, . . . , Y n ) T . In this paper, \i is recovered with the help of linear 
estimates 

£*(y) = hiYi, heH, (1.2) 
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where % is a finite set of vectors in W 1 which will be described later on. 

In what follows, the risk of an estimate fiiY) = (p,i(Y), . . . , fi n (Y)) T is 
measured by 

R(fL,^ = E^\\jl(Y)- f i\\ 2 , 

where E M is the expectation with respect to the measure generated by 
the observations from (1.1) and ||-|| , (■, •) stand for the norm and the inner 
product in M. n 

n n 
i=l i=l 

Since the mean square risk of £i h (Y) 

R^ h ,^ = \\(l-h) f ,\\ 2 + a 2 \\h\\ 2 

depends on h £ T~L, one can minimize it choosing properly h € 7~L. Very often 
the minimal risk 

r H (u) = mm R(u h , u) 
hen 

is called the oracle risk. 

Obviously, one cannot make use of the oracle estimate 

a*(Y) = h* ■ y, h* = argmm R(u h , u) 

because it depends on the underlying vector. However, one could try to con- 
struct an estimator p^(Y) based on the family of linear estimates fi h (Y), h £ 
H, with the risk mimicking the oracle risk. This idea means that the risk of 
p^-(Y) should be bounded by the so-called oracle inequality 

ii(^, / u)<r w ( / ,)+A w ( M ), 

which holds uniformly in [i £ M. n . Heuristically, this inequality assumes 
that the remainder term A^(/i) is smaller than the oracle risk for all /i. In 
general, such an estimator doesn't exist, but for certain statistical models 
it possible to construct an estimator fl n (Y) (see, e.g., Theorem 1.1 below) 
such that: 

• A H (/j>) < Cr n (fi) for all \i G R n , where C > 1 is a constant. 

• A n (fi) < r H (^) for all [i : r n (fi) > a 2 . 

It is well-known that one can find the estimator with the above properties 
provided that % is not very rich (see, e.g., [2]). In particular, as shown in 
[10], this can be done for the so-called ordered smoothers. This is why this 
paper deals with % containing solely ordered multipliers defined as follows: 
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Definition 1.1. "H is a set of ordered multipliers if 

• hi & [0,1], i = 1, . . . ,n for all h G %, 

• hi + \ < hi, i = 1, . . . , n for all h El-L, 

• if for some integer k and some h,g G 7~L, ht < Qk, then hi < gi for all 
i = 1, . . . ,n. 

The last condition means that vectors in H may be naturally ordered, 
since for any h,g G H there are only two possibilities hi < gi or hi > gi for 
all i = 1, . . . , n. Therefore the estimators from (1.2) are often called ordered 
smoothers [10]. 

Notice that ordered smoothers are common in statistics (see, e.g., [10]). 
Below we give two basic examples, where these smoothers appear naturally. 

Smoothing splines. They are usually used in recovering smooth regression 
functions f(x), x £ [0, 1], given the noisy observations 

Zi = f{xi) + ai[, i = l,...,n, (1.3) 

where Xi G (0, 1) and ^ are i.i.d. Gaussian random variables with zero mean 
and unit variance. It is well known that smoothing spline is defined by 

f a (x,Z) = argmm{f>i - f{x t )] 2 + a j\f^\x)f^, (1.4) 

where f^ m \-) denotes the derivative of order m and a > is a smoothing 
parameter which is usually chosen with the help of the Generalized Cross 
Validation (see, e.g., [20]). 

To transform this model into the sequence space model (1.1), consider 
the Demmler-Reinsch [5] basis ipk{x), iG [0,1], k = 1, . . . ,n having double 
orthogonality 

{^k^i)n = ^ku I ^™\x)'4) { j m \x)dx = 5 k i\k, k,l = l,...,n, 
Jo 

where here and below (u, v) n stands for the inner product 

{u,v) n = y^u{xi)v(xi). 
i=l 

It is assumed for definiteness that the eigenvalues are sorted in ascending 
order Ai < . . . < A n . 
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With this basis we can represent the underlying function as follows: 

n 

f( x ) = /~2^k(x)fi k (1.5) 
k=l 

and we get from (1.3) 

Y k = (Z, ip k ) n = fj, k + a£ k . 
Next, substituting (1.5) in (1.4), we arrive at 

n 

fa(x,Y) = Jl k lj) k {x), 



k=l 



where 



fi = arg mini ^[Yfc - fi k ] 2 + a ^ X k fJ, k \ . 

11 ^k=l k=l ' 

It is seen easily that 



Yk 



1 + a\ k 

and thus, we conclude that the model (1.1)-(1.2) is equivalent to (1.3)-(1.4) 
with 

1 + at\ k 

Notice that a similar equivalence with 

h k = max(l — a\ k , 0) 

takes place in the minimax estimation of smooth regression functions from 
Sobolev's balls [17]. 

The Demmler-Reinsch basis is a very useful tool for statistical analysis 
of spline methods. In practice, this basis is rarely used since there are very 
fast algorithms for computing smoothing splines (see, e.g., [8] and [20]). 

Spectral regular izat ions of large linear models. Very often in linear 
models, we are interested in estimating X\x G W 1 based on the observations 

Z = Xfi + a£, (1.6) 

where X is a known n x p- matrix and £ is a standard white Gaussian noise. 
It is well known that if X T X has a large condition number or p is large, 
then the standard maximum likelihood estimate Xjj, (Z), where 

pP(Z) = argmin \\Z - X^f = (X T X)~ 1 X T Z 
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may result in a large risk. More precisely, if X T X > 0, then 

E||A>-X/i || 2 = a 2 p. 

Usually the risk of Xjl (Z) may be improved with the help of some reg- 
ularizations. For instance, one can use the Phillps-Tikhonov regularization 
[19] 

fi a (Z) = argmin{||Z - A>|| 2 + a||^|| 2 }, 

where a > is a smoothing parameter. It is seen easily that 

fi a (Z) = [I + a(X T X)- 1 }- 1 ft°(Z). 

This formula is a particular case of the so-called spectral regularizations 
defined as follows (see, e.g., [6]): 

jx a (Z) = H a (X T X)fi (Z), 

where H a {-) : M + — >• [0, 1] is a function depending on a smoothing parameter 
a G M + . The matrix H a (X T X) may be easily defined when H a (X), A G R + 
admits the Taylor expansion 

oo 
s=0 

Then 

oo 

H a (x T x) = h%i + J2 K(x T xy, 

8=1 

where / is the identity matrix. 

Notice that for the Phillps-Tikhonov method we have 

H a (X)= 1 X,aeR + 
1 + a/A 

and it is clear that this family of functions is ordered in the sense of Definition 
1.1. Along with the Phillps-Tikhonov regularization, the spectral cut-off 
and Landweber's iterations (see, e.g., [6] for details) are typical examples of 
ordered smoothers. 

The standard way to construct an equivalent model of the spectral reg- 
ularizations is to make use of the SVD. Let e&, k = 1, . . . ,p and Ai < A2 < 
. . . < A p be eigenvectors and eigenvalues of X T X. It is easy to check that 
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is an orthonormal basis in M n . Therefore Z defined by (1.6) can be repre- 
sented in the following equivalent form 

Y k * = (et,Z) = (el,X»)+cje k , (1.7) 

where are i.i.d. JV(0, 1). Notice also that 

Xfi a (Z) = XH a (X T X){X T X)- 1 X T Z 

and hence 

(Xfi<*(Z),el) =Y,Ys(XH a (X T X)(X T X)- 1 X T e* s ,e* k ) 

(1-8) 

= J2Y s \ k (H a (X T X)(X T X)- 1 e s ,e k ) = H a {\ k )Y k . 

s=l 

In view of (1.7) and (1.8), we see that the spectral regularization methods 
are equivalent to the statistical model defined by (1.1) and (1.2). 

Nowadays, there are a lot of approaches aimed to construct estimates 
mimicking the oracle risk. At the best of our knowledge, the principal idea 
in obtaining such estimates goes back to [1] and [13] and related to the 
method of the unbiased risk estimation [18]. The literature on this approach 
is so vast that it would be impractical to cite it here. We mention solely 
the following result by Kneip [10] since it plays an important role in our 
presentation. Denote by 

n 

f(Y,fi h ) = \\Y - fi h (Y)\\ 2 + 2a 2 Y, h i~ (1-9) 

i=l 

the unbiased risk estimate of fi h (Y). 
Theorem 1.1. Let 

h = arg min f(Y, jl h ) 
heH 

be the minimizer of the unbiased risk estimate. Then uniformly in fi € M. n , 

EJ^ - ^|| 2 < r*(/0 + Ka 2 ] Jl+ r -^, (1.10) 
where K is a universal constant. 



6 



Another idea to construct a good estimator based on the family fi h , h £ 
T~L is to aggregate the estimates within this family using a held-out sample. 
Apparently, this approach was firstly developed by Nemirovsky in [14] and 
independently by Catoni (see [3] for a summary). Later, the method was 
extended to several statistical models (see, e.g., [21], [15], [11]). 

To overcome the well-know drawbacks of sample splitting one would 
like to aggregate estimators using the same observations for constructing 
estimators and performing the aggregation. This can be done, for instance, 
with the help of the exponential weighting. The motivation of this method is 
related to the problem of functional aggregation, see [16] . It has been shown 
that this method yields rather good oracle inequalities for certain statistical 
models [12], [4], [16]. 

In context of the considered statistical model, the exponential weighting 
estimate is defined as follows: 

/z(Y) = ^ ^(Y)/An 



hay. 



where 



w h (Y) = vr^exp 



f(Y, fi h 



2/3a 2 



gen 



cxp 



2/3a 2 



P>0, 



and f(Y : jl h ) is the unbiased risk estimate of fi h (Y) defined by (1.9). 

It has been shown in [4] that for this method the following oracle in- 
equalities hold. 

Theorem 1.2. If (3 > 4, then uniformly in \x £ W 1 



R(fi, n) < min V \ h R{fi, M) + 2cr 2 /3/C(A, it) 

A h >0:||A||i=l I 



-hen 

\h . \ , r, 2 , 



R(fi, /i) < min<j R{(i\ y) + 2a 2 p log — 
where IC(-,-) is the Kullback-Leibler divergence 



(1.11) 



£(A,7r) = V A ft log ^. 

— 7T/j 

hen 



Notice that for projection methods {h^ G {0, 1}) this theorem holds for 
P > 2, see [12]. 
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It is clear that if we want to derive from (1.11) an oracle inequality 
similar to (1.10), then we have to chose 7T/j = (#%) _1 , where ffH denotes 
the cardinality of Ti, and thus we arrive at 

This oracle inequality is good only when the cardinality of T~L is not very 
large. If we deal with continuous T~L like those related to splines smoothing 
with continuous smoothing parameter, this inequality is not good. To some 
extent, this situation may be improved, see Proposition 2 in [4]. However, 
looking at the oracle inequality this proposition, unfortunately, one cannot 
say that it is better than (1.10). 

The main goal is this paper is to show that for the exponential weighting 
we can get oracle inequalities with smaller remainder terms than that one 
in Theorem 1.1, Equation (1.10). 

In order to attain this goal and to cover % with low and very hight 
cardinalities, we make use of the special prior weights defined as follows: 

^l-e X p{Jl ft+ 'l'-'l""'}. (1.12) 

Here 

h + = min{<7 £ % : g > h} 

^/imax _ where h m&x is the maximal multiplier in H, and || • ||i stands for 
the Zi-norm in R n , i.e., 



5> 



Along with these weights we will need also the following condition: 
Condition 1.1. There exist constants K , K° such that 

J^ihj - gf) > ^o(|N|i - Hslli) for allh>g from H, (1.13) 
i=i 

\\h + f < K°\\h\\ 2 for all h€n. (1.14) 

The next theorem, yielding an upper bound for the mean square risk of 
fi(Y), is the main result of this paper. 

Theorem 1.3. Assume that (3 > 4 and Conditions 1.1 hold. Then, uni- 
formly in fi £ M. n , 



E^H/i-^f <r n (fi) + 2(3a z log 



a 2 + \ a 2 



(1.15) 
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where > 0, x > is a nondecreasing function bounded at and such 

that 

T , N Cx 
\og{x) 

We finish this section with a short discussion concerning this theorem. 

Remark 1. The condition j3 > 4 may be improved when the multipliers 
h E % take only two values and 1. In this case it is sufficient to assume 
that ft > 2 (see [9]). 

Remark 2. In contrast to Proposition 2 in [4], the remainder term in (1.15) 
does not depend neither the cardinality of T~L nor n. It has the same structure 
as Kneip's oracle inequality in Theorem 1.1. 

Remark 3. Comparing (1.15) with (1.10), we see that when 



0*) 



a 2 



then the remainder terms in (1.10) and (1.15) have the same order, namely, 
Co 2 . However, when 



> 1, 



we get 



2pa 2 log 



a 2 \ a 2 



1 + r^M) 



thus showing that the upper bound for the remainder term in the oracle 
inequality related to the exponential weighting is better than that one in 
Theorem 1.1. 

Remark 4. We carried out numerous simulations to compare numerically 
the remainder terms in (1.15) and (1.10) and to find out what /3 is optimal 
from a practical viewpoint. Below we summarize what we obtained for the 
smoothing splines. 

• Nearly optimal /3 is close to 1, but unfortunately, good oracle inequal- 
ities are not available for this case. 

• There is no big difference between the exponential weighting with 
/3 = 1 and the classical unbiased risk estimation. Both methods 
demonstrate almost similar statistical performance. However, when 
r H {n)/o' 2 is close to 1, the exponential weighting works usually bet- 
ter. 
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• It seems to us that the remainder term in the oracle inequality (1.10) is 
too large. We couldn't see the square-root behavior in the simulations. 
On the other hand, the remainder term in (1.15) seems adequate to 
simulation results. 



2 Proofs 

The approach in the proof of Theorem 1.3 is based on a combination of 
methods for deriving oracle inequalities proposed in [12] and [9]. The cor- 
nerstone idea is to make use of the following property of the unbiased risk 
estimate: let 

h = argminf(Y, jl h ) 
heH 

be the minimizer of the unbiased risk estimate, then for any sufficiently small 
e < 1, there exists h > h such that with the probability 1, 

f(Y,p, h )-r(Y,ti h ) > 2f3a 2 e[\\h\\ 2 - fhf] -2f3a\ 

for all h > h € . This property means that w h (Y) are exponentially decreasing 
for large h and therefore we can obtain the following entropy bound (see 
Lemma 2.3 in the paper) 

Y, ™ h (Y) log 4 ^ lo S \ E ^ + Ce ' 1 expfCe- 1 ) 



w" 



'h<h e 



Here and in what follows, C denotes a generic constant. 
Next, we prove the following upper bound 

with the help of Lemma 2 in [7] (see Lemma 2.5 below). Finally, we combine 
these facts following the main lines in the proof of Theorem 5 in [12]. 

2.1 Auxiliary facts 

The next lemma collects some useful facts about the prior weights 7r ft defined 
by (1.12). 

Lemma 2.1. Under Condition 1.1, for any h EH, the following assertions 
hold: 
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9>h 

there exists a constant C Q such that 



^exp^Uexpj-^, (2.1) 



^2^<C \\hf + C , (2.2) 

9<h 



there exist constants ir and ir° such that 



7T < ^ ^ < 7T°. (2.3) 

3:|N| 2 <ll9l| 2 <INP + l 



Proof. Denote for brevity 

^ = 5>*exp{-Mk 



9>h 

Then we have 



ft 



S h - S h+ =7r h + exp 



I ft 

Iblli-ll^llil f Nli-ll^lh 



x E ^ex P {-Mk^}_ E ^ exp ( 

g>h+ P > g>h+ 



ft 



7r h - i 1 - exp 



/3 



Therefore in view of the definition of ir h , it is clear that if S hmax = 1, then 
S h = S h+ , thus proving (2.1). 
To prove (2.2), notice that 

g< llg + lli-||g||i 
* ~ ft 

and hence, by Conditions (1.13) and (1.14), 

E^^^E0Wi-Wi]- l|fc+lll " Ml 



9<h g<h ^ 

< ||^ + || 2 -||^n|| 2 < i^lHI 2 " \\hmin\ 



2 



K ft K ft 
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In order to check (2.3), consider the following subset in T~L 

9h = {g:\\h\\ 2 <\\g\\ 2 <\\h\\ 2 + l}. 

Let gh be the maximal element in Qh- Then there are two possibilities 

• ||rf< W + l/2, 

• \\g h \\ 2 >\\hf + l/2. 
In the first, case we have 

1/2 < \\gtf -\\g h \\ 2 <2[\\gl\\ 1 -\\g h \\ 1 ] 
and therefore by (1.12) 

n 9 > n 9h > i _ exp f J\ 9 hh ~ \\9h\\i \ = 1 _ exp^!/^)]. (2 _ 4) 

In the case, where ||/i|| 2 + 1/2 < \\g h \\ 2 < \\hf + 1, we make use of that by 
the Taylor expansion, for any g < gh 

q ^ - IIpIIi ( \9h\\ 

7T > 7, exp 



P \ P 

s llff + lli - Iblli / \\9h\\ 2 - \\h\\ 2 \ \\g^\\i - \\g\\i 

— a eX P or? — o eX P 



P "V PKo ) ~ P V PKo 

and thus, 

§ * - — * — exp {-wJ - — v — exp r^ 

1 ( 1 

This equation together with (2.4) guaranties that there exists tt such that 
for all hen 

g&Qh 

The proof of the inverse inequality ^2 ge g h it 9 < ir° is quite similar to 
that one of (2.2). ■ 

The following lemma is a cornerstone in the proof of Theorem 1.3. 
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Lemma 2.2. For f3 > 4 the risk of n(Y) is bounded from above as follows: 
EJ/2(Y) - ^|| 2 < w h (Y)r(Y,fi h ). 



hen 



Proof. It is based essentially on [12]. Recall that the unbiased risk 
estimates for fii(Y) and pf}{Y) are computed as follows (see, e.g. [18]) 

f(Y u ft) = MY) - K t ] 2 + 2a 2 - a 2 , 
f(Y h ^) = [tf{Y) - Yi\ 2 + 2a 2 h i - a 2 . 
Since YlheH w h = 1, we have 

[fi l (Y)-Y i ] 2 =Y / ™ h (Ym(Y)-Y i \ 2 

hen 

= ^ w\Y)[u i {Y) - $(Y) + p,i(Y) - Y,f 

hen 

= ™ h (X)MY) - ti(Y)\ 2 + 52 wh ( Y )M(Y) ~ Y % ] 2 

h&H hen 

+ 2 J2 ™ h (Y)\fii(Y) - $(YMHY) ~ Y t ] (2 6) 

heH 

= 52 w h {Y)[UY) - tf{Y)] 2 + 52 w h (Y)[tf(Y) - Y t ] 2 



hen hen 

+ 2 52 w h (Y)[fii(Y) - tfiYWKY) ~ MY) + MY) - **] 



hen 

= -52 w h (Y)\pi(Y) - ${Y)] 2 + 52 w h (Y)[tf(Y) - Y t ] 2 . 
hen hen 

From the definition of p,(Y) we obviously get 

d~MY) _ V w h (Y) dgiY} , V dwh{Y h h (Y) 

dYi ~ 2^ w v > gy + 2^ dYi to v > 

1 hen 1 hen 1 
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and combining this equation with (2.6) (see also (2.5)), we arrive at 
f{Y it iH) = [fc l (Y)-Y 1 ] 2 + 2a 2 



_ 2 dm{Y) ^ 2 



dYi 



hen 



^w h (Y)\[^(Y)-Y i \ 2 + 2a 



dY 



m\Y) - Hi [Y)\ + 2cj — fi { (Y) 



dY 



hen 



(2.7) 



+ Y^w h {Y){-{UY)-^(Y)f + 2a 
hen [ ~ 



2 dl og[w h (Y)] h 

dY, Mi 1 > 



w h {Y)f{Y u tf) + ™ h (Y)\ —\p,i(Y) - tf(Y)f 
hen hen ^ 

,dlog[w h (Y)] r ^ h . 



+ 2a z 



BY, 



\Pi(Y) — fii(Y)] 



In deriving the above equation it was used that Ylhen wh (Y) = 1 and hence 



hen 



dY 



hen 



To control the second sum at the right-hand of (2.7), we make use of the 
following equation 



log w h (Y) 



2/?a 2 



gen 



f (Y, pf> 



Therefore 



hen 



dY, 



2/3<r 



h ^ df(X ^ h) M(Y)-MY)]. 



hen 



dY, 



Substituting in the above equation (see (1.9)) 



dY, 



2(1 - hi) 2 Y, 
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we obtain 



Ero , (y) 8i2gW [A?(r) _ ft(y)] 

heH 1 

= -^ 2 E^( y )[^- 1 ] 2 ^-^> 



(2.8) 



hen 



where 



h = J2 ™ h (Y) hi . 



hen 



Next noticing that 

(1 - hi) 2 = (1 - h t f + - hi) 2 + 2(1 - ^)(^ - hi), 

we have 

-y, 2 w h (Y)(hi - l) 2 {hi - hi) = Y 2 (l - hi) 2 w h (Y)(hi - hi 



hen 



hen 



+Y 2 ^2 w h (Y)(hi - h % ) 2 {fn -hi + 2-2h 
hen 

= 2Y 2 Y J ™\Y)(h-h l ) 2 (l- f ±±^ 
hen 



<2^w h (Y)[fi l (Y)-tf(Y)} 2 . 
hen 

Combining this equation with (2.6)-(2.8), we finish the proof. ■ 

Lemma 2.3. Suppose {q h < 1, h G %} is a nonnegative sequence such that 
• for all h > h 



q h < exp< -e 



]>> 2 - *?) 



U, e>0. 



< 



for some h* such that \\h*\\ 2 
q 9 > q , for all g € Q h * = Ig 6 U : 



*||2 



< lor < 



||h*|| 2 + i}. 
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Let 



W h = 7r h q h 



gen 



i -l 



Then 



H(W h ) M W h log ^ < log tt" + exp[P(e)] 



hen 



h<h 



where 



PM l + , C[l + e-6log( go )] 
P(e) = log 1 



e e<?o 
Proof. Decompose % onto two subsets 

Q = {h>h}ug h *, V = H\Q. 

Denote for brevity 

p=j>v, q = j>v. 

By convexity of log(x) 

P ^ 7tV (P + Q)/P 



H(W n ) 



P + Q ^ P 



+ 



Q (P + Q)/Q 



q h /Q 



_P P + Q Q P + Q , P 

< log — h — — log — h 



+ 



P + Q 



^TrVlogi+QlogfQ) 



(2.9) 



P+Q P P + 
1 



Vies 



Next, notice that xlog(l/x) is an increasing function when x G [0, e ]. 
Therefore, using that 1 — exp(— e) > (l + e) _1 e, we get with Condition (1.13) 
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and (2.3) 



h€Q 



heg h * 



+ 5> h exp -ej^ihf-h!)-! eJ2(h!-h!) + l 



h>h 



i=l 



i=l 



< 7r° log h - n h exp \-eK Q 

q e *-^L 

h>h 



[K < 



To continue this inequality, we make use of (see (1.12)) that ir h < \\h+\\i 
\i and thus, 



E^log-Uc+^exp^ 



o il 



-)] 



h£Q 



h>h 



(2.10) 



[K t 



+ 1 \ \\V 



In order to bound from above the right-hand side at this equation, consider 
the set {h £ T-L ■ h > h}. We may assume that {h^, k = 1, . . .} in this set 
are ordered so that > Denote for brevity 



Si 



i i - " i 



With these notations we can write 
Y^e W [-eK (\\h\\i-\\h\\i)] [& 



L ) + l](||fc + ||i- Wi) 



Let us check that 



^exp[-eA' S'i] [eS* + l] (S i+ i - Si 



(2.11) 



max ^expf-eKoS'i] - < 



S k ,k>l 



i>l 



c 



(2.12) 



where [rr] + = max(0, x). Solving the equation 




dS 



- ^2 ex P [-eKoSi] [Si+i - Si] + = 0, 



i>l 
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we obtain with a simple algebra 

c c _ ex P[ eK o{Si ~ Si-i)] - 1 

bi+i — b{ — — . 

eK Q 

Hence 

exp[-eK 5j_i] - exp[-eK Q Si 



exp(-eK Si)(S i+ i - Si 



and summing up these equations, we get (2.12). 
Similar arguments may be applied to prove that 

C 

max Si exp\-eK Si] [S i+1 - Si] + < ——. 
i>i 

With Equations (2.10)-(2.13) we get 

yvrVlogl< C[1+£ " gl ° S(go)] 

and similarly 



heQ 

Therefore 



log(Q)<log^±^ 



Denote for brevity 

Q 

X 



P + Q 

Then with (2.14) and (2.16) we arrive at 



H(W h ) < max < — x log(x) — (1 — x) log(l — x) 
se[0,i] [ 

log^Tr^+x^e)), 



+ (1 - x 

where 



C[l + e] C[l + €-elog( go )] 

J?(e) = log 1 

e ego 
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It is seen easily that the minimizer x* of the right-hand side at (2.17) is a 
solution to the following equation 

logi-^=log(j>^-i? (e ) 

and thus 

** = {i+fe^)exp[-fl(e)]} ■ 

Therefore from (2.17) we get 

iTG^)<log(^7r A Vlog(l-<) 

-^[log^+logfe^)-^) 
L \%ev ' J 

= log(^ it 1 *) - log(l - x*) = log vr' 1 + e R ( £ ) 

J2^ + e R(e) 



hev 



<log 



h<h 



Lemma 2.4. Lei £j be i.i.d. jV(0, 1) and Q be a set of ordered sequences. 
Then for any a > 

, n U } C 

Emaxji^G?? - 2< 7i )(£f - 1) - a^ ff f j < -, (2.18) 

i=l i=l 



c 



E max { VVl - 5i ) 2 Ci^ - a V(l - <?;)V 2 > < -• (2-19) 
9eg it! U J a 



Proof. It follows from Lemma 2 in [7]. 
Lemma 2.5. Zei 

/i e = maxj/i : [f(Y,/i h ) - f K (F)] < 2j3eo 
where e S (0, (5/3) _1 ) and 



+ 



2/3a 2 }, (2.20) 



r (Y) = mm r(Y, n ), /i = argminrfY, a ). 

hen hen 
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Then 



2 



Proof. By the definition of r(Y,p, h ), see (1.9) and (2.20), we get 
k = max j/i : ||(1 - h)fi\\ 2 + a 2 (l - 2/3e)\\h\\ 2 

oo oo 
i=l i=l 

<||(l-^f + a\l - 2(3e)fhf 

OO OO n 

+ 2^(1 - hif^i + v 2 Y,Ch 2 i ~ 2k){g - 1) + 2/3a 2 I. 

i=l i=l ^ 

Let us fix some 7 G (0,1). Then we can rewrite the above equation as 
follows: 



/i e = maxj/i : (1 - 7 )||(1 - /i)/i|| 2 + a 2 {\ - 2/3e - 7 )IN| 2 

00 

+2aE(l " ^) 2 Mi + 7ll(l - ^ll 2 
i=l 

00 

+ a 2 J2(h?-2h i m-l)+ 1 a 2 \\h\\ 2 
i=i 

<(1 + 7)11(1 -AV|| 2 + a 2 (l-2/3e + 7 )||/ l || 2 

00 

+ 2aE(l-^) 2 M^-7ll(l-^H 2 

i=i 

00 

W E(^ 2 - 2^)(e| - 1) - 7^ 2 |N| 2 + 2(3a 2 . 
i=l ' 
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Therefore 

k < h'" = max\h : a 2 (l - 2/3e - 7 )||/i|| 2 



+ min 



+ min 



2aY,^-9i) 2 ^i+l\\(l-9)^ 
i=i 

oo 
i=l 

<(1 _ 2 pe + i) [||(1 + a 2 \\hf] 

r OO 

2a ]T(1 - 5 i) 2 Mi " (7 " 2/36)11(1 - o)m|| ; 



+ max 



+ min 

sew 



i=l 
oo 



a 2 £( 5 2 -2 ft )(e 2 -l) + 



7<r 



,2|i „i|2 



i=l 



Next, bounding max and min in this equation with the help of Lemma 
2.4, we arrive at 

(1 _ 2 pe - 7)a 2 B,\\k\\ 2 < (1 - 2/3e + 7 )E M {||(1 - &)/i|| 2 + <r 2 ||n|| 2 } 

Co 2 



+ 



7 -2/3e' 



Hence, choosing 7 = 3/3e, we get 



a 2 ^\\k\\ 2 <±±^V,{\\(l-h)u\\ 



1 - 5/3e 



+- 



0" 



2/3 + £ 
/3e 



(2.22) 



(1 - 5/3e) 

To control the expectation at the right-hand side in (2.22), notice that 
for any given g € % the following inequality 

n n n n 

£[1 - n 4 ] 2 y, 2 + 2a 2 £ h < £[1 - ft ] 2 Y, 2 + 2a 2 £ ffl 

i=l i=l i=l i=l 

holds. This yields immediately 

00 00 
||(1 - hM 2 + ^Unll 2 + 2a £(1 - + ^ " 2 ^ " !) 

i=l i=l 

00 00 

11(1 " gM? + ^ 2 ||<?l| 2 + 2a £(1 - 5i ) 2 M* + * 2 " 2 ^)^ 2 " !)■ 



< 



i=l 



i=l 
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So, for any 7 £ (0, 1), we get with this equation and Lemma 2.4 

(1 - 7 )B M {||(1 - hM\ 2 + a 2 \\h\\ 2 } < ||(1 - 5 )H| 2 + <j 2 \\g\\ 2 



+2(7 max 
9 



E(i-ft)V^-^E(i-5i) 



i=l 



+cr max 

9 



+2u max 

9 



< 



f'J 



i=l 



00 



i=l 



-s)HI 2 + - 2 NI 2 + — 

7 



Next, minimizing the right-hand side in g £ T~L, we have 
E A {||(1 - A)H| 2 + ^INI 2 } < Y^Z^ifj,) + 



(1-7)7' 



Choosing in the above display 7 = /3e and substituting thus obtained in- 
equality in (2.22), we get (2.21). ■ 



2.2 Proof of Theorem 1.3 

By the definition of w h (Y) we have 
1 



log[lV h (Y)] = -^2p?{Y, £*) + log 7T h - logj E ^ eX P 



9GW 



f (y, /is 

2/3tT 2 



2a 2 (3 



f(Y,v h ) 



1 



:f(YiA h ) + logvr ft 



log E 



ir 9 exp 



2cj 2 /3 

f(Y,fi 9 ) -f(Y,ft h ) 
2/^2 



7T 



where /i = argmin/^% r(Y, /i' 1 ). Therefore 

E w h (Y)r(Y, fl h ) = f(Y, fi k ) + 2(3a 2 E ^00 log 

hen 

-2f3a logi > ir 9 exp — 



Tu A (y) 



(2.23) 
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Next notice that for all h such that 11 
Condition (1.13) 



exp 



r(Y, jl h ) - f(Y, fi h ) 



2/3ct 2 



exp 



< INI 2 < INI 2 + 1 we have by 

1 - h)Y\\ 2 - {i - h)Y\\ 2 



2/3cr 2 



1 n 

-^2(hi-hi) 



1=1 



> exp 



Ko 



(2.24) 



We begin to control the right-hand side at (2.23) with the last term. 
Ordering the elements in 7i, we obtain 



log < ir 9 exp 



f (Y, pp) - f(Y, fi h ) 



9>h 

= log I ir 9 exp 

9>h 



213a 2 

f(Y,fi9)-f(Y,fi h y 
2/3T 2 _ 

||(i-g)y|| 2 -||(i-A)y|| 2 

2/3ct 2 



(2.25) 



i n - 1 1 r r 1 n 

gJ]lft-M [ > logj ^vr 9 exp -/li] L 

P i=l - 1 J L P i=l J J 



Thus, from (2.25) and (2.1) we get 

r(Y,fi 9 ) -r(Y,fi h ) 



log £ 



7r 9 exp 



2(3a 2 



> 0. 



(2.26) 



Our next step is to bound from above the second term at the right-hand 
side of Equation (2.23). Lemmas 2.3 and 2.5 help us in solving this problem. 
Let h e be defined by (2.20). Then for all h > h e 



[f(y,A ) -f n (Y)) > 2/3 



\h\\ 2 ] +2/3o- 2 



and in view of (2.24) we obtain with Lemma 2.3 and (2.2) 

E„ £ w\Y) log < E„ log{ £ n h + exp[R(e)] } 

< logjE^ll^f + l + exp[fl(e)]}. 
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Next, we bound E M ||/i e || 2 with the help of (2.21), thus arriving at 



£ w h (Y)r(Y,p h ) < E M f«(y) + 2/3cr 2 logj^— i 
hen 



r n (p) 



5/3e 



+ 



c 



(1 - 5/3e)e 



+ exp[i?(e)] 



(2.27) 



To finish the proof of the theorem it remains to minimize the right-hand 
side at this equation in e. Assuming that e < 1/(5/3) we obtain 



1+pe r H (a) 

x — ^ + 



C 



1 - 5/3e a 2 



(1 - 5/3e)e 



+ exp[i?(e)] 



< 



r n (p) Cer n {p) C 



+ 



H h exp 

e 



Therefore choosing 



^(x) = C min 

ee[0,l/(5/3)] 



1 fC 

ex H h exp — 

e V e 



and combining Lemma 2.2 with (2.27), we complete the proof of (1.15) since 
obviously 



^r H (Y) < r H (p). 

It is clear that ^(O) is bounded from above. It is also easy to check that 
as p — > 0, 



£*(/?) = arg min < Ce + p 

ee[o,l/(5#)] 1 



C (C 

h exp — 

e V e 



C C ( c 

log -1 — + 2 log" 2 — log log — 

P P V P 



and thus, ^(x) < Cx/\og{Cx). 
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